bucket
$bucket#
Categorizes input documents by an expression you define, such as a specific field, into groups called buckets. Buckets are defined by bucket boundary values you set. For example, you could categorize temperature values into buckets that are defined by timestamp boundaries.
This aggregation stage outputs a document per bucket. For each bucket document, it's _id is the bucket's lower boundary, and the count for each bucket or the computed custom field data you want to output for each bucket.
Note: The
$bucketstage has a memory limit. If it exceeds the limit, the operation returns an error.
Syntax#
{ "$bucket": { "groupBy": "<expression>", "boundaries": [ "<lowerbound1>", "<lowerbound2>", "..." ], "default": "<literal>", "output": { "<output1>": { "<$accumulator operator>" : "<$expression>" }, "<outputN>": { "<$accumulator operator>" : "<$expression>" } } }}
| Field | Type | Description | Required |
|---|---|---|---|
"groupBy" | Expression | Within quotation marks, enter $ followed by the field name you want to group by buckets, for example, "$temp" for temperature values. | Required |
"boundaries" | Array | Enter an array of boundary values. The data range between two boundaries is a bucket. For example, the array [ 0, 5, 10 ] creates two buckets: one between the boundaries of [0, 5], and the other between [5, 10]. Lower boundaries are inclusive and upper boundaries are exclusive. You must specify at least two boundaries and the array of boundary values must be in ascending order and of the same type with the exception of mixed numeric types. | Required |
"default" | Literal | Enter a literal that serves as the _id of a default bucket that takes any document that does not fall between any of your bucket boundaries. If you do not define a default bucket, any input document that does not fall within one of the bucket ranges throws an error that stops the operation. The default value must be less than the lowest boundary value, or greater than or equal to the highest boundary value. The default value can be of a different type than the boundary array values. | Optional |
"output" | Document | Name a custom field, then define an accumulator operator and expression pair to compute the data you want to output for the custom field. You can define multiple custom fields. If you do not specify an output document, the operation returns a "count" field that contains the number of documents in each bucket. If you define an output document, only your custom fields return in the output document–if you require a count output, create a custom field and operator-expression pair for this function, for example: { "count": {"$sum": 1} }. | Optional |
Example#
The following example computes the average temperature of each bucket and counts the number of documents in each bucket. The bucket boundaries are timestamps.
The data aggregation has the following stages:
$match: Filters for documents with a"temp"field$project: Converts dates to timestamp strings$bucket: Groups the documents by the projected"$tsAsString"field values into buckets that are defined by specific timestamps. Values that do not fall within a defined boundary range land in the "Other" default bucket.
Sample Request#
[ { "$match": { "temp": { "$exists": true } } }, { "$project": { "temp": 1, "_tsMetadata": 1, "tsAsString": { "$dateToString": { "date": "$_ts", "format": "%Y-%m-%dT%H:%M:%S:%L%Z" } } } }, { "$bucket": { "groupBy": "$tsAsString", "boundaries": [ "2023-05-09T00:00:00Z", "2023-05-09T01:00:00Z", "2023-05-10T00:00:00Z", "2023-05-11T00:00:00Z", "2023-05-12T00:00:00Z" ], "default": "Other", "output": { "count": { "$sum": 1 }, "avgTemp": { "$avg": "$temp" } } } }]Sample Response#
{ "_list": [ { "avgTemp": 520.5, "count": 24, "_id": "2023-05-10T00:00:00Z" }, { "avgTemp": 374.7, "count": 10, "_id": "2023-05-11T00:00:00Z" }, { "avgTemp": 611.6470588235294, "count": 17, "_id": "Other" } ]}